An Efficient Clustering and Distance Based Approach for Outlier Detection
نویسندگان
چکیده
Outlier detection is a substantial research problem in the domain of data mining that aims to uncover objects which exhibit significantly different, exceptional and inconsistent from rest of the data. Outlier detection has been widely researched and finds use within various application domains including tax fraud detection, network robustness analysis, network intrusion and medical diagnosis. In this paper we propose an efficient clustering and distance based outlier detection technique. The clustering algorithms employed for this task are PAM, CLARA and CLARANS and a novel clustering algorithm I-CLARANS is proposed. The process of outlier detection is divided into two stages. In the first stage clustering is performed and in the second stage outlier detection is performed. The purpose is to perform clustering and outlier mining simultaneously. The experimental results depict that the proposed method is effective and promising in practice. We also present comparison of proposed algorithm with existing algorithms to validate its advantage in outlier detection. Keywords— Outlier detection, Data Mining, Clustering, PAM, CLARA, CLARANS.
منابع مشابه
Outlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملA Novel Subspace Outlier Detection Approach in High Dimensional Data Sets
Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in s...
متن کاملA Framework for Outlier Detection in Geographic Spatial Data
Outlier detection is very interesting, useful and challenging problem in the field of data mining. Because of sparse data clustering algorithm which are based on distance will not work to find outliers in spatial data. Problem of finding irregular feature in spatial data need to be explore. Many existing approaches have been proposed to overcome the problem of outlier detection in spatial Geogr...
متن کاملOutlier Detection Using Extreme Learning Machines Based on Quantum Fuzzy C-Means
One of the most important concerns of a data miner is always to have accurate and error-free data. Data that does not contain human errors and whose records are full and contain correct data. In this paper, a new learning model based on an extreme learning machine neural network is proposed for outlier detection. The function of neural networks depends on various parameters such as the structur...
متن کاملSupport Vector Clustering for Outlier Detection
In this paper a novel Support vector clustering(SVC) method for outlier detection is proposed. Outlier detection algorithms have application in several tasks such as data mining, data preprocessing, data filter-cleaner, time series analysis and so on. Traditionally outlier detection methods are mostly based on modeling data based on its statistical properties and these approaches are only prefe...
متن کامل